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ABSTRACT 
The United States Recruiting Command (USAREC) utilizes the Delayed Entry 


Program (DEP) as the foundation for their management of the continuous flow of 
recruits into the training base. Though there are many benefits of the DEP, a major 
shortcoming is that some DEP members do not enlist, becoming DEP losses. This is 
costly in terms of valuable resources such as lost recruiter time, and the potential for 
training seats being unfilled. Any effort which assists in reducing DEP loss would be 
a valuable contribution. 

This research models individual level DEP loss using multivariate dichotomous 
logistic regression. Explanatory variables used were individual, demographic, and 
USAREC policy in nature. Modeling efforts used data that were easily accessible to 
USAREC to ensure ease of potential future use. Univariate analysis was conducted on 
candidate explanatory variables prior to model building. The model was built using 
forward and backward stepwise logistic regression. Final model refinement included 
scaling of interval variables and the addition of one interaction term. 

Using statistical tests, the model as a whole was determined to exhibit some lack 
of fit. Closer analysis indicated that the model does perform well across many levels 
of estimated probability of DEP loss. Using USAREC’s red, amber, green DEP loss risk 
classification system, the model appears to have significant predictive powers. The 
model also performed well using this classification system for a validation data set. It 
is concluded that this fitted model could prove useful in supplementing the field 


experience of the recruiter in predicting DEP loss risk of individual recruits. 
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I. INTRODUCTION 


The United States Army Recruiting Command (USAREC) 
utilizes the Delayed Entry Program (DEP) as an important 
management tool in ensuring the US Army receives a continuous 
flow of recruits. The Delayed Entry Program provides benefits 
to the recruit and the Recruiting Command alike. A major 
shortcoming of this program is that some newly contracted 
recruits in the DEP pool do not enlist. This attrition process 
is costly in recruiting resources and potentially results in 
training seats being unfilled. This research models the DEP 
loss process in an attempt to identify contracts with 


relatively high risks of DEP loss. 


A. DELAYED ENTRY PROGRAM DESCRIPTION 

The DEP is an enlistment program which allows’ an 
individual to delay entry onto active duty for a period of up 
to 365 days. It is best thought of as a reservation system. 
Qualified applicants are allowed to contract for enlistment at 
a specified time, for particular training, and a guaranteed 
job, for an agreed upon time of service [Ref. 1]. The 
recruiter keeps in close contact with the DEP member to help 
ensure that he remains mentally and physically qualified for 
enlistment, and that he maintains his desire to enlist. DEP 


management is any activity that promotes this accession goal 


and includes funded and unfunded DEP functions, optional 
military training or instruction, and other activities. DEP 
Management is quite similar to the initial recruiting process 
in that the initial contract is continuously resold while the 
recruit is in the DEP. [Ref. 2] 

The day in which a young person could walk into the 
Recruiting Office, sign up, and ship out is gone. With the 
arrival of the Drug and Alcohol Testing (DAT) in June 1988, 


DEP is the vehicle in which all recruits enter active duty. 


B. DEP BENEFITS 

The DEP provides benefits for both the recruit and USAREC. 
The DEP allows the recruit to lock in training, schooling and 
an assignment, many months in advance. A recruit in high 
school can make definitive plans for the future early in his 
senior year. The DEP also allows the recruit a wider range of 
available assignments. The recruiter is able to project out 
one year for available assignments. This is especially 
valuable for the top quality recruit who qualifies for all 
assignments. 

The DEP provides benefits to the US Army because it allows 
for efficient resource management in a business that tends to 
be extremely seasonal. The DEP aids in future planning of 
training availability and personnel requirements. Recruiters 
are able to focus on high quality recruits rather than meeting 


short term accession goals. US Navy research efforts indicate 


that a large DEP pool may actually assist recruiting [Ref. 1}. 
This may be due to the promotion incentives offered to DEP 
members who refer candidates who then enlist. In effect, every 
DEP member becomes a recruiter, representing the US Army in 
the high schools and work places, creating a type of recruit 
network. 

Another byproduct of the DEP is that it may result in 
lower first term attrition. One study conducted for the US 
Army in 1985 concluded that the longer the recruit was in the 
DEP the more likely he was to successfully complete his term 
of service. The theory of this study is that a recruit who has 
more time to evaluate his contract decision, and then accesses 
onto active duty, will be more inclined to fulfill his 
contractual obligation [Ref. 3]. A related theory is that 
someone who survives a longer period in the DEP may be more 
committed to begin with, so that a portion of the total 


attrition occurs in the DEP rather than after enlistment. 


C. DEP SHORTCOMINGS 

The DEP is not without its costs to USAREC. During the 
period a recruit is in the DEP, he may attrite or become a DEP 
loss. A DEP loss may be the result of a myriad of reasons 
ranging from death or serious injury, to apathy, to joining 
another service or National Guard. During the last ten years, 
DEP loss has grown from 7% upwards to 13% in FY 89. As of 1 


December 1990, approximately 15% of all contracts signed in FY 


90 resulted in DEP losses.' 


Figure I depicts the trend over 
the last 20 quarters. Large DEP losses’ significantly 
contributed to USAREC not meeting its accession goals in 
October and November 1990, the first time in over seven years. 

USAREC Regulation 601-95 states, "DEP loss has a major 
impact on mission accomplishment." A DEP loss must be 
replaced by a new recruit, demanding valuable recruiter 
resources and time. If a DEP loss occurs shortly before the 
accession date, a training seat could remain unfilled. With 
smaller defense budgets, the US Army cannot afford to under 
utilize its training resources. In the last year, USAREC 
reports that recruiters are finding they must make on the 
average 12 contacts with potential recruits, versus an average 
of 8 in previous years, to secure one enlistment [Ref. 4]. 


This indicates that it may become even more difficult to 


recruit replacements for DEP losses in the future. 


D. CURRENT USAREC DEP SYSTEM 

USAREC's command goal is to reduce DEP loss to six percent 
or less of all signed contracts [Ref. 2]. As Figure 1 
indicates, this goal has not been reached in any of the last 
20 quarters and only during two, one month periods in FY 90. 


USAREC Regulation 601-95 outlines many approved techniques to 


' As of 1 December 1990, approximately 80% of all 
contracts signed in FY 90 had resulted in accessions or DEP 
losses. The remaining recruits were still awaiting accession 
onto active duty or DEP loss. 


COHORT DEP LOSS 


BY CONTRACT QUARTER 


PERCENT DEP LOSS 


FISCAL YEAR / QUARTER 





Figure 1 Cohort DEP Loss by Quarter FY86 - FY90 


help avoid DEP losses. These include: minimum standards for 
number of times a recruiter contacts a DEP member, DEP 
incentive programs, and funded DEP events. Currently, 
recruiters rely only on their experience in the field to 
categorize their recruits in the DEP as being high, medium, or 
low DEP loss risks. Recruiters are required to report to their 
chain of command monthly their subjective opinion as to the 
risk status of their DEP members using the following coding 


scheme: 


@® Green: Indicates the DEP member remains motivated to 
access onto active duty and there are no foreseeable 
problems. 

® Amber: Indicates there may be potential problems with 
either motivation or qualification to access onto active 
Gite y < 


® Red: Indicates a problem. This DEP member for whatever 
reason iS a probable or certain DEP loss. 


This system of using the field expertise of the recruiter and 
his personal knowledge of each DEP member appears to be 
valuable. USAREC could potentially augment this system with 
quantitative techniques or models to better assist in 
predicting DEP losses. 

Chapter II summarizes the goals of this research and the 
general approach that was taken. Chapters II and III concern 
selection of candidate explanatory variables and initial 
analysis of these variables. Chapter V details the building of 
the model and its refinement. The last three Chapters, VI 
through VIII assess the model's fit, explores a possible model 


use, and finishes with recommendations and conclusions. 


II. RESEARCH GOALS 


A. APPROACH 

USAREC maintains a large historical database containing 
extensive information on every contract that is _ signed 
throughout the Command. The approach of this study was to use 
this database and other readily available USAREC data 
resources to develop a DEP attrition model. This approach has 
resulted in quantitative models that should be useful to 
USAREC as supplements to field expertise. Research focused on 
providing the recruiter in the field with a system to 
complement his subjective opinion as to the risk of a DEP 
member becoming a loss. Though certain conclusions were drawn 


regarding USAREC DEP policies, this was not the emphasis. 


B. PREVIOUS RESEARCH EFFORTS 

Research was conducted on the DEP loss process during the 
1980's. Current USAREC DEP tracking and analysis is aggregated 
at the Recruiting Battalion level to provide early warning in 
case accession goals are in jeopardy. Several studies have 
used time series analysis to predict the rate in which DEP 
loss occurs [Ref. 5]. A shortcoming with this approach is it 
assumes DEP losses occur on the date reported in the database. 
These dates are then used for developing models of DEP loss 


rates. In actuality, this date merely reflects when the 


recruiting chain of command officially reported the loss. The 
actual date in which the recruit decided to leave the DEP 
could have been months prior. 

Individual contract level models have been developed but 
focused on only those contracts signed by high school seniors 
and graduates in the highest mental category.* The most recent 
year of recruiting data used in developing these models was FY 
88. Our research used data covering all non prior service 
contracts signed in FY 86 through FY 90. We examined 


contributions of the following new areas: 


® The 17 - 21 year old population in each Recruiting 
Battalion's region 


® Military/civilian pay ratios for the Recruiting Battalion 


® Total number of Department of Defense recruiters in the 
Recruiting Battalion's region 


® Recruiting Battalions 
@ Career Management Field (CMF) of contract 
@® Renegotiation status of the contract 


@® Number of recruiters per contract in the Recruiting 
Battalion (contract density) 


@® Brigade (local) and national advertising budgets 


The inclusion of these new variables may potentially 


result in better predicting power as compared to already 


2 Nelson, 1988, Army Research Institute and Celeste, 
1989, WESTAT. 


existing models. Additionally, many officials at USAREC 
believe the combination of a declining advertising budget, 
fewer recruiters in the field, and a dwindling 17 - 21 year 
old population have significantly impacted all recruiting 
operations over the last five years.* All three of these 


concerns are addressed in the models developed here. 


3 This information was obtained during interviews with 
USAREC personnel from 18 November through 21 December 1990 
during an experience tour at USAREC Headquarters, Fort 
Sheridan, IL. 


IIIT. VARIABLE DEVELOPMENT 


There are many Similarities between the initial selling of 
a contract by a recruiter and the reselling that goes on with 
a member of the DEP. The recruiter must periodically meet with 
the DEP member and resell him on his initial contract. This 
recruiting effort receives command emphasis throughout USAREC. 
For this reason, many of the same variables used in contract 
production models were analyzed for applicability in a DEP 
loss model. Explanatory variables can be described as being 


either individual, demographic, or policy factors. 


A. INDIVIDUAL FACTORS 

Individual factors are the personal characteristics of the 
DEP member. Table I shows the variables that were considered 
for inclusion and their source. These variables represent the 
characteristics of the recruit on the day that the contract 
was signed. USAREC updates the EDUC variable as the DEP 
member's education status changes. Therefore, this value was 
obtained from a previous education code in the database. The 
EDUC variable includes four classes. All education codes 


indicating education levels above high school were aggregated 


10 


Table I INDIVIDUAL FACTORS TO BE ANALYZED 





en ee ee 
VARIABLE DESCRIPTION SOURCE 
Trace 


EDYRS YEARS OF EDUCATION USAREC MM 


; EDUC STATUS OF HIGH SCHOOL DIPLOMA, EITHER IN HIGH USAREC MM 
SCHOOL, NON GRADUATE, DIPLOMA GRADUATE, OR OTHER 
TYPE OF GRADUATE 


ARMED FORCES QUALIFICATION TEST SCORE USAREC MM 
CONTDATE DATE IN WHICH CONTRACT WAS SIGNED USAREC MM 


DEPEND NUMBER OF DEPENDENTS USAREC MM 





NOTE: 1. USAREC MM is the Minimaster database maintained at USAREC containing information 
on all contracts signed during a fiscal year. 





into one class. Likewise, the many types of high school 
graduates other than regular diploma graduate were aggregated 
into one class. RACE was aggregated into the four numerically 
largest races. The category OTHER included the remaining less 


populace races. 


B. DEMOGRAPHIC FACTORS 

Demographic factors are the characteristics of the 
geographic region in which the recruit lived when the contract 
was signed. Table II describes these variables and their 
sources. Quarterly data were used to calculate these 
variables. When monthly data were available, as in the MISSION 


and DOD variables, the quarter's mean was used. The level of 


AE 


Table II DEMOGRAPHIC VARIABLES TO BE ANALYZED 


VARIABLE DESCRIPTION 
LOCAL UNEMPLOYMENT RATE IN THE RECRUITING SUPERSITE 


BATTALION IN THE QUARTER IN WHICH THE CONTRACT 
IS SIGNED 


BN —s—SsdsRECRUITING BATTALION (54 CONSIDERED) USAREC MM 


RECRUITING BATTALION RATIO: USAREC MM / 
MILITARY AVAILABLE 17-21 OLD BERLIANT 
NUMBER OF CONTRACTS 


PAYRATE RECRUITING BATTALION RATIO: SUPERSITE / 
CIVILIAN MEDIAN INCOME US ARMY FINANCE 
E-2 UNDER 2 YEARS PAY 

DOD 


RECRUITING BATTALION RATIO: USAREC PAE 
MILITARY AVAILABLE 17-21 OLD 
MEAN NUMBER OF DOD RECRUITERS 





NOTE: 1. Supersite is the DOD Manpower Data Center's Supersite Demographic Database; 


USAREC MM is the USAREC Minimaster database; Berliant is an Army Research Institute study (Ref. 
6); USAREC PAE is the USAREC Program Analysis and Evaluation Directorate 








the demographic variable is the Recruiting Battalion. PAYRATE 
was not indexed for inflation. Since civilian median income 
and E-2 pay increased separately, the ratio of these two 
incomes was the explanatory variable used. Of the 55 
Recruiting Battalions, the San Juan Battalion was eliminated 
from the study due to lack of demographic data. 

The MISSION variable was used to represent contract 
density in each region. A large value indicates a high output 
Recruiting Battalion relative to their available population 
base. It also might indicate a propensity of candidates in the 


region to join the US Army. 


IZ 


The DOD variable was included to allow for the presence of 
Department of Defense recruiters. Small values in this 
variable would represent competition from the other services 
for the available recruit population. Many USAREC officials 
postulate that there is an increased propensity to join the US 


Army when any service is well represented in a region. 


C. POLICY FACTORS 

Policy factors are those characteristics of the contract 
that are dependent on USAREC policies current at the time the 
contract was signed. Table III describes these factors and 
their sources. Note that the TIMEDEP variable is the 
contracted time to be in the DEP, not the actual time. As with 


Table III POLICY VARIABLES TO BE ANALYZED 





VARIABLE DESCRIPTION 
LTIMEDEP TIME CONTRACTED TO BE IN THE DEP USAREC MM 
|BONUSAMT | AMOUNT OF BONUS ( IF ANY ) USAREC MM 


BINARY VARIABLE INDICATING IF A CONTRACT RENEGOTIATION | USAREC MM 
OCCURRED WHILE IN THE DEP 


INDICATES IF THE RECRUIT IS AN ARMY COLLEGE FUND TAKER | USAREC MM 


CAREER MANAGEMENT FIELD (31 AVAILABLE) USAREC MM 
TERM TERM OF CONTRACTED ENLISTMENT USAREC MM 


CONPER CONTRACTS PER RECRUITER FOR THE QUARTER IN THE USAREC PAE 
RECRUITING BATTALION 


BDEADV , BRIGADE LOCAL ADVERTISING BUDGET FOR THE FISCAL YEAR USAREC APAD 
AND RECRUITING BRIGADE 


NATADV NATIONAL ADVERTISING BUDGET FOR FISCAL YEAR USAREC APAD 


NOTE: 1. USAREC MM is the Minimaster database; USAREC PAE is USAREC Program Analysis and 
Evaluation Directorate; USAREC APAD 1s USAREC Advertising and Public Affairs Directorate. 











13 


demographic factors, CONPER is the quarterly mean with respect 
to both number of contracts and the number of recruiters. Data 
were aggregated at the Recruiting Battalion level. The BDEADV 
and NATADV advertising variables were indexed to FY 86 dollars 
using USAREC Advertising and Public Affairs Directorate 


advertising price indexes. 


D. DATABASE 
1. Sources 

As shown in Tables I through III, the USAREC 
Minimaster database was the primary source of data for this 
model development. These records are year end pictures of all 
recruiting contract activity during the fiscal year. Contracts 
are represented on successive fiscal year Minimaster files 
until the contract is closed by either accession or DEP loss. 
An example: a contract signed in FY 86 with an accession or 
DEP loss in FY 87 would be on both Minimaster 86 and 87 
databases. Minimaster 86 would indicate this as an open 
record. Then, Minimaster 87 would contain the accessionmenaete 
of the contract. 

Minimaster 86 did not include the bonus amount of the 
contract but only whether one was received. Using historical 
bonus information from USAREC Recruiting Operations 
Directorate, these data were reconstructed. 

Information regarding US Army and DOD recruiter field 


strength and advertising budgets was obtained § from 
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directorates at USAREC Headquarters. DOD Manpower Data Center 
(DMDC) provided the employment and civilian median income 
information for each Recruiting Battalion. DMDC subcontracted 
to provide USAREC with a Supersite system which aggregates 
county level economic data to Recruiting Battalion level [Ref. 
7]. The source for the 17 - 21 year old prime recruiting 
market at the battalion level was a 1989 Army Research 
Institute study conducted by Kenneth R. Berliant [Ref. 6]. 
2. Database Development 

Statistical Package for Social Scientists (SPSS) was 
used for screening, sorting, and merging the Minimaster 
records in preparation for model development. This statistical 
package was used because of its widespread use at USAREC. This 
should assist any future updating of the model as data become 
available. Table IV details the results of the database after 
screening for Baganted, Hesorae and data errors. A total of 
247,592 records were eliminated as being open, prior service, 
from the San Juan Battalion, or contracts signed before FY 86. 
Open records were not closed out in the given fiscal year as 
a result of accession or DEP loss. They were then repeated and 
closed out in the following fiscal year. Approximately 3.5% of 
the records were eliminated due to coding errors in the data. 
Due to the large size of the database, 715,668 records, it was 
not felt that this would significantly bias the data or the 


analysis results. Analyses indicated that the eliminated 
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records possessed approximately the same percentage of DEP 
losses as the entire contract population. 

After the Minimaster files were screened and 
concatenated, the demographic and policy variables containing 
quarterly values were merged to create the final large 
database. There were 689,278 contract records available, each 
containing DEP loss status and values of 24 candidate 


explanatory variables. 
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Table IV RESULTS OF DATABASE SCREENING 


—_ a = _ — _— 


RECORDS INITIALLY AVAILABLE 
MINIMASTER FY86 208,504 
MINIMASTER FY87 206, 326 
MINIMASTER FY88 192,048 
MINIMASTER FY89 193,682 | 
MINIMASTER FY90 162,700 


963, 260 


RECORDS ELIMINATED 
OPEN RECORDS | 112,293 
PRIOR SERVICE RECORDS 


CONTRACTS SIGNED IN FY85 60,680 


RECORDS FROM SAN JUAN BATTALION 8,418 


247,592 
RECORDS ELIMINATED DUE TO ERRORS IN DATA 


12,467 
4 846 
2,195 
1,907 
1947 
1716 
579 
512 
130 


RECORDS AVAILABLE FOR ANALYSIS TOTAL 





NOTES: 1. Open records have not been closed out in the given fiscal year as a result of 
accession or DEP loss. They are then repeated and closed out in the following fiscal year. 
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IV. DATA SUMMARY 


A. DEP LOSS TRENDS 

An initial analysis with data in the DEP loss database 
concerned possible seasonal effects on DEP losses during the 
Recruiting year. Two methods were used to calculate the DEP 
loss percentages. The first method, shown in Figure 2, was by 
contract cohort. Contracts for the months of FY 86 through 


FY 90 were tracked as a cohort. Percent DEP loss is the 


COHORT DEP ANALYSIS 
BY CONTRACT MONTH 


a PERCENT DEP LOSS 


4 
OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP 
MONTH 


FISCAL YEARS 
—— 86 —— 87 —* 88 -—° 89 —*— 90 





Figure 2 Contract Cohort DEP Loss Analysis 
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percentage of this cohort that resulted in a DEP loss. 
There did not appear to be any strong reoccurring seasonal 
trend. The significant increase in DEP loss in the spring of 
1988 was a result of a one time DEP forgiveness program 
instituted by USAREC in response to accession cutbacks. 

The second method for examining DEP loss was by accession 
cohort. The accession status of all recruits that were 
projected to access in the months of FY 86 through FY 90 were 
tracked. The percent of the accession cohorts that resulted in 


DEP loss is depicted in Figure 3. 


COHORT DEP ANALYSIS 
BY PROJECTED ACCESSION MONTH 


- PERCENT DEP LOSS 


4 nl | al, oe eee eee ee 
OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP 


MONTH 








FISCAL YEARS 
— 200. "@ -5/ or es. —o 89 ~—<— 90 





Figure 3 Projected Accession Cohort DEP Loss Analysis 


19 


There appeared to be a trend for higher DEP losses in 
spring, March through May, quring each of the five fiscal 
years. This may have been a result of high school seniors who 
Signed contracts early in the year. They then may have changed 
either education or career goals in the spring. Since there 
appeared to be a seasonal trend, a dummy variable for 
projected accession month was included in the model 


development. 


B. INTERVAL VARIABLES 

Fourteen of the 23 initial explanatory variables were 
interval (scale) variables. Using SPSS, initial analyses were 
conducted to determine if there were significant differences 
between the two groups, accession and DEP loss, with respect 
to these variables. The mean values for the two groups are 
listed in Table V. The T-test is used as a basis for rejecting 
or failing to reject the null hypothesis that the two sample 
means are equal. Due to the large sample size (689,278), the 
T-test does not require that the samples come from a Normal 
population. With T-test significance levels below .00005 for 
these interval variables, there is less than .005% chance that 
such sample means would be this different if the population 
means were equal. We acknowledge that with this large sample 
that the null hypothesis will almost always be rejected. 
Though statistical significance is indicated, we believe there 


is practical significance in the difference of these means. 
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Table V INTERVAL VARIABLE ANALYSIS 


VARIABLE DESCRIPTION ACCESSION... DEP LOSSES 3 
CONTRACTS ~ | 


AGE IN YEARS OWN CONTRACT DATE 19.972 19.7859 
EDYRS YEARS OF EDUCATION 


AFQT ARMED FORCES QUALIFICATION TEST PERCENTILE ed 59.7147 
SCORE 


soar 23.27 


LOCAL (BN) UNEMPLOYMENT RATE AT TIME OF 
CONTRACT 
RATIO: 394 .65 412.83 
MILITARY AVAIL 17-21 YEAR OLD (BN) 
NUMBER OF CONTRACTS (BN) 
PAYRATE | RATIO: 2.872 2.937 
CIVILIAN MEDIAN INCOME (BN AREA) 
MILITARY PAY (E-2 UNDER 2 YEARS) 
; 








T iwrervat 
| VARIABLE 





van 
i 










CONPER RATIO: NUMBER OF CONTRACTS (BN) 
MEAN # OF RECRUITERS ASSIGNED (BN) 


RATIO: 767.85 


MILITARY AVAIL 17-21 YEAR OLD (BN) 
MEAN # OF 00 RECRUITERS (BN) 






ee LOCAL ADVERTISING BUDGET FOR THE FISCAL 890 ,607 


| PS N 
| & 





872,658 


NATADY 2 USAREC NATIONAL ADVERTISING BUDGET FOR FISCAL 65,093,198 63,654,535 
YEAR 


NOTES: 1. Variables are calculated using data for quarter in which contract was signed. 2. 


Variables are calculated for fiscal year in which contract was signed. 3. T-test significance less 
than .00005 





The variable TERM is the only variable in which the practical 
Significance appears questionable. 

The mean values for these interval variables give some 
insight into the DEP loss contract holder, compared to those 


who access. The DEP loss is slightly younger and has fewer 


Zn 


years of education because he may be more likely to still be 
in high school. His AFQT score is higher than average 
contracts which may indicate more opportunities. His contract 
term of service is longer and he gets less than an average 
bonus amount. He has fewer dependents to worry about and is 
planning on spending much more than average time in the DEP 
awaiting accession onto active duty. The economic situation in 
his Recruiting Battalion region is better than average as 
indicated by lower unemployment and better civilian pay. There 
is less contract density in his Recruiting Battalion region. 
There are more DOD recruiters in his region than average. 
USAREC spends less on advertising in his region of the 
COUMEY .« 

The CONPER values appeared counter intuitive. The number 
of contracts per recruiter was lower for DEP loss contract 
holders. This may indicate that high mission recruiters tended 
to have less DEP losses. This phenomena may be due to USAREC's 
Recruiting Zone Analysis (RZA) that assigns recruiters and 
missions to Recruiting Battalions. This could indicate that 
high propensity regions as determined by RZA suffer less DEP 
losses. 

As previously mentioned, the large database assisted in 
increasing the significance of these T-tests. This may have 
overemphasized their explanatory value as covariates in 


attrition models. Even so, these interval variables appeared 
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significant in the univariate analyses and were included as 


candidate explanatory variables in the modeling process. 


C. CLASS VARIABLES 

The remaining nine explanatory variables were categorical 
or class variables. Again, using SPSS, cross tabulations with 
Chi-Square tests were conducted to determine if DEP loss 
status was independent of the class variables. Table VI lists 
the first seven class variables and Appendix A, Tables XIII 
through XVI list the class variables with larger numbers of 
levels, Career Management Field (CMF) and Recruiting 
Battalion. The results of the Chi-Square tests indicated that 
all the class variables were highly significant. As with the 
interval variables, there is less than a .005% chance that 
such distributions would have occurred if DEP loss status was 
independent of these class variables. 

Initial analyses indicated that marital status, sex, 
education level, and contract renegotiation status were the 
more significant explanatory class variables. Several of the 
CMF's and Recruiting Battalions appeared to be strong 
explanatory variables. CMF 00 had a 99.4% DEP loss rate. 
According to USAREC Recruiting Operations Directorate, this is 
not a valid CMF. It was used in FY 87 and FY 88 as a surrogate 
CMF for known DEP losses who were not officially dropped for 


an extended period. This use of CMF 00 freed the previously 
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Table VI CLASS VARIABLE ANALYSIS 






















percent ' | PERCENT 


ACCESSION j; DEP LOSS 


MARITAL MARITAL STATUS TIME OF CONTRACT SS S| 
9.6% MARRIED | _ “10, | ee 
. SINGLE 90.4% SINGLE / NOT MARRIED 95.44 


. MALE 84.6% MALE 85.5 78.24 
« «FEMALE 15.4% FEMALE 14.5 21.76 


_ WHITE 
_ BLACK 24.9 
2.34 
1.2 
2.0% OTHER / UNKNOWN 


a a 

EDUC EDUCATION CODE AT CONTRACT ee | ell 
[SENIOR __|_ 29.7% IN SCHOOL 
. .NONGRAD 3.8% NON-GRADUATE HIGH SCHOOL | 3 ce le Ziel 

. .DIPGRAD 62.5% DIPLOMA GRAD HIGH SCHOOL 48.1 






CLASS 


VARIABLE DESCRIPTION 
VARIABLE 


1 























6 





. OTHGRAD 4.1% OTHER TYPE GRAD HIGH SCHOOL 4.14 3.4 


ACF ARMY COLLEGE FUND TAKER ee 
. » FAKER 18.9% ACF TAKERS 19.04 17.63 
. -NOTAKER 81.1% NOT ACF TAKERS 80.96 82.37 


RENEGOTIATION OF CONTRACT IN DEP er See 
. .YESRENO 8.9% OF CONTRACTS RENEGOTIATED | g.22 | 93.95 | 
| ..NORENO _|_91.1% NOT RENEGOTIATED 91.78 86.05 


RECFY RECRUITING FISCAL YEAR IN WHICH CONTRACT 
WAS SIGNED 


Pe Fize. et stone wees 4 
ee SS Ss 


19.0% SIGNED IN FY88 18.64 2i.f6 


. 88 
/..89 | 20.3% siGNeD IN FY89 | 20.05 | 22.13 
90d 10.1% sicweo in FY90 as | eee 


TOTAL ¢ TOTAL CONTRACT PERCENTAGES | gg.az | 11.83 | 


NOTES: 1. Cell difference significance less than .00005 Chi-square test. 2. Class variable 
analysis for Career Management Field (CMF) and Battalions see Appendix A, Tables XIII through XVI. 
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reserved CMF to be used for another contract. Rather than 
delete these records and loose the data, they were retained 
and dealt with during model development. 

The results of the data assessment process justified 
inclusion of the 23 candidate explanatory variables. It also 
revealed that due to a seasonal trend, the projected accession 
month may be a strong explanatory variable. In our model 
development we attempted to use these 24 interval and 


categorical variables to predict DEP loss. 
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V. MODEL DEVELOPMENT 


A. MODEL SELECTION 

Empirically, the individual process of attrition from the 
DEP is represented by a dichotomous (binary) dependent 
variable which categorizes individuals either as accessions or 
DEP losses. The dependent variable definition is as follows: 


O , if individual i accesses into the US Army 
Y,= 1, if individual i is a DEP loss. 


Logit models are particularly well suited for dichotomous 
dependent variables because the logistic distribution lends 
itself to a meaningful interpretation. For notational 
purposes, the quantity: 


n(X) = E( ¥ |X) a 


is used to represent the conditional mean of Y (DEP loss or 
accession) given the covariates X (explanatory variables). 
The specific form of logistic regression model we used is 


as follows: 


= 2 1 
m(X) = E(Y|X) = 5 rs (2) 


where g(X) is the linear combination: 
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g(X) = B, +B,x, + Bx, +... + BUX, (3) 


Where p is the number of covariates, x, 1=1,...,p are the 
covariates, X =(X1)Xqr 027%), B, is the constant parameter, 
Guhl) peake the Coefficient parameters. 


The conditional mean in equation (1) is bounded in value 
by zero and one because of the fraction on the right hand side 
of equation (2). The usefulness of logistic regression is 
that the value, 2(X) may be interpreted as the probability of 
being a DEP loss (Y=1) given explanatory variables xX, or 
P(Y=1|X). 

The logit transformation used in the fitting of the model 


as : 


g(X) = 1n Gexiey (4) 
This logit, g(X) is linear in its parameters, is a continuous 
variable ranging in value from negative infinity to infinity. 
In order to estimate the value of n(X) the parameters By 
through B, from equation (3) must be estimated using the 
method of maximum likelihood. [Ref. 8:p. 1-11] 

The method of maximum likelihood uses the known 
covariates, X, to compute the estimates for B, through B, so 
as to maximize the likelihood of obtaining the observed DEP 


Foss statucw(¥—-0 orl). For a sample of size n, let y.-and 
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X, =(Xyi,Xojr--- 1X5) be the observed DEP loss status and vector 
of corresponding covariates for individual i, i=1,...,n. The 
likelihood (normal) equation resulting from the method of 


maximum likelihood for By rite) s 


> ix, a(x) I= 0 (>) 


if=1 


Similarly, the normal equations for B, through B, are: 


2 


> Xy, ly, — (%,)1 = 0 St Sty 2p ee (6) 
1=1 
The value of the vector B=(B,,B,,... ,B,) given by the solution 


of these p+l1 equations is B , the maximum likelihood estimator 
for B. The values for the estimated probability of DEP loss 
are obtained from equations (2) and (3) by replacing B with B. 
The estimated probability of DEP loss is denoted 7. An 


interesting result of equation (5) is the following: 


pe | 


Sw = MY RX we 
f=1 


The sum of the n observed values, jy,, is equal to the sum of 
the n predicted (expected) values, ,. This property of 
logistic regression was exploited in our assessment of the fit 
of the model. The solution of the normal equations above is 


found by an iterative process which has been programmed into 
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many available logistic regression computer software packages 
such as SPSS. The development and rationale for this model is 


given in Reference 8, pages 8-11. 


B. MODEL BUILDING 
SPSS, version 4.0, Logistic Regression Procedure was used 
to fit the model. This procedure required recoding of the 
class (categorical) variables. The following class variables 
with two levels were recoded (0,1) to indicate the presence of 
an attribute: MARITAL (married=1), SEX (female=1), ACF 
(yes=1), and RENO (yes=1). The other six class variables were 
recoded using the deviation coding scheme [Ref. 9:p. 55]. The 
number of new dummy variables required to represent a class 
variable with n levels is n-1. For the deviation coding 
scheme, if any of first n-1 levels of a class variable were 
present its corresponding new dummy variable was assigned the 
value of one. Otherwise, the new dummy variable was assigned 
the value of zero. In order to represent the presence of the 
nth level of a class variable, all the n-1 new dummy variables 
were asSigned the value of negative one. This resulted in the 
creation of 105 new variables to represent RACE, EDUC, RECFY, 
BN, CMF, and PADDMO. 
1. Variable Selection 
SPSS's Logistic Regression procedure has7~ the 
capability of executing stepwise variable selection. We used 


the forward stepwise selection as a basis for building our 


Zo 


model. The algorithm commenced with only the constant term in 
the model. Then, the variable with the lowest significance 
level for the Score statistic, provided it was lower than the 
chosen cutoff value P,,, was entered into the model. The Wald 
statistic's significance level was used to examine variables 
for possible elimination [Ref. 9:p. 56]. If the Wald 


statistic's significance level was higher than P the 


out / 
variable was eliminated from the model. If no variable met the 
elimination criteria, the next eligible variable was added. 
This process continued until either a previously selected 
model was encountered or there were no further variables 
meeting the entry or removal criteria. Dummy variables 
representing the different levels of a class variable entered 
or were removed from the model aS a group. [Ref. 9:p. 56-57] 

Hosmer and Lemeshow [Ref. 8:p. 88] suggest the use of 
Pp, = .15 and P,, = -20 as the best criteria for Uselea 
stepwise logistic regression using the Wald statistic. These 
criteria were aimed at selection of important variables for 
the model while also providing a parsimonious model. 

Due to the computationally intensive nature of the 
iterative algorithms used to fit the model, combined with the 
numerous models built in forward stepwise regression, only a 
random 10% sample (68,962 cases) of the database was used in 


variable selection. This sample size required nearly 24 hours 


of CPU time on an Amdahl 5990-500 mainframe computer. 
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Variable selection resulted in all variables meeting 


the P, / P,,, criteria except two: these variables, MISSION and 


t 
RECFY, were excluded from the model. The MISSION (contract 
density) variable's exclusion may have been a result of 
Recruiting Zone Analysis (RZA) used in assigning contract 
quotas to the Recruiting Battalion. RZA uses many of the same 
explanatory variables as our fitted model to determine each 
Recruiting Battalion's contract density. Therefore, this 
MISSION variable may not have provided the fitted model with 
information not already supplied by other explanatory 
variables. The non-selection of RECFY (Recruiting Fiscal Year) 
by the stepwise procedure may indicate that there was not a 
strong yearly influence on DEP loss that was not represented 
by one of the other chosen explanatory variables. This 
exclusion could prove to be helpful in future prediction uses 
of the model. 
2. Interaction Terms 

Univariate analyses and insight into the recruiting 
environment suggested that consideration of certain 
interaction terms was appropriate. A dozen interaction terms 
including combinations of RACE, EDUC, DEPEND, SEX, and MARITAL 
were considered. Only the RACE by EDUC interaction term was 


Significant with respect to P. in the stepwise procedure. The 
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inclusion of this interaction term did not result in the 
removal or entry of any previously selected or non-selected 
variables. 

3. Scaling 

The continuous scaled interval variables were checked 
for the assumption of linearity in the logit, g(X), in 
equation (3). To this point all the interval variables, less 
MISSION, were identified as significant. Scaling assisted in 
obtaining the correct parametric relationship during the model 
refinement stage. We used the Box-Tidwell transformation to 
evaluate the need for scaling [Ref. 8:p. 90]. This simple 
technique adds a term of the form x:ln(x) to the model for 
each continuous scaled interval variable. If the coefficient 
of these new variables appeared significant, there was 
evidence of non-linearity in the logit. 

This technique resulted in six of the thirteen 
selected class variables, EDYRS, TIMEDEP, AGE, UNEMP,.CONPER, 
and DOD indicating possible non-linearity. This technique 
could not be used for BONUSAMT and DEPEND because they 
included many values of zero. Therefore, these two variables 
were also included for further analysis. 

A technique proposed by Hosmer and Lemeshow [Ref. 8:p. 
90] was used in identifying the need to introduce new, higher- 
order variables in the model as a scaling method for those 


variables indicating possible non-linearity. The range of each 


a2 


of these independent continuous interval variable was broken 
into groups and treated as a class (categorical) variable. 
Fach case waS assigned to the categorical class that 
represented its range in the original interval scale. The 
group representing the lowest scaled values served as the 
referent group. A model was fit to the same 10% random sample 
of the database using univariate logistic regression with only 
the one categorical variable. We then plotted the estimated 
coefficients for the levels of the categorical variable versus 
the group midpoint values from the initial interval scale. We 
chose the most logical shape for the scaling of the 
independent variable. 

Figure 4 illustrates the results of using this 
technique on EDYRS (years of education). The unusual shape of 
the curve suggested that those in the DEP with eleven years of 
education had a higher probability of becoming a DEP loss. 
Likewise, DEP members with substantially more or less than 
eleven years of education appear to be at a greater risk of 
DEP loss relative to those with only several years more or 
less than eleven years of education. 

We created a new variable, EDYRS2, representing 
| EDYRS-111 , the distance from eleven years education. Model 
log-likelihood, covered in more detail in Chapter 6, was used 
to compare the improvement of introducing new higher order 
terms. The larger the model log-likelihood statistic, the more 


likely that if the fitted model is the correct model the 
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SCALE TEST EDYRS 


REGRESSION COEFFICIENTS 


a a 
10 11 12 13 14 15 16 17 18 19 20 


GROUP MIDPOINTS 





Figure 4 Hosmer-Lemeshow Scale Analysis on EDYRS 


observed results would be obtained given the estimated 
parameters, B. Univariate analysis indicated that EDYRS2 alone 
more than doubled the model log-likelihood over EDYRS by 


itself. 
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The same Hosmer-Lemeshow grouping technique was used 
for EDYRS2 to determine the need for introduction of higher 
order terms. Figure 5 depicts this new assessment. This curve 
appeared to be quadratic in the logit. A quadratic tern, 
EDYRS22 = (EDYRS2)* was added to the model. The model 
containing EDYRS2 and EDYRS22 doubled the model log-likelihood 
again and was more than four times larger than the model 


containing EDYRS alone. Similar analyses were conducted on the 


SCALE TEST 
EDYRS DISTANCE FROM 11 


REGRESSION COEFFICIENTS 
0 


5 6 
GROUP MIDPOINTS 





Figure 5 Hosmer-Lemeshow Scale Analysis on EDYRS2 
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other seven continuous variables for which non-linearity in 
the logit was suspected. Five of these seven assessments 
resulted in the scaling depicted in Table VII. 

As a result of the addition of these higher-order 
terms, the variables BONUSAMT, DOD, DOD2, and DOD3 were 
eliminated from the model using backward stepwise elimination. 
Thewsane Svialuccwee. aro = .20 as in forward stepwise 


selection were used. 


Table VII RESULTS OF HOSMER-LEMESHOW SCALING 


ORIGINAL | SCALING NEW IMPROVEME 
VARIABLE VARIABLES RESULTSIE ee | 
TIMEDEP3 = (TIMEDEP)> 
a mass 
DEPEND3 = (DEPEND)> 


Lcowpen2 = ccommen® | 2a 


2 
3 




















CUBIC DOD2 = (DOD) 
DOD3S = (DOD) 


445 % 






NOTES: 1. Improvement is the percent increase in the model log-likelihood of the fitted 
model containing the new higher-order variables over a fitted model containing only the original 


var liable. 
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C. MODEL EXECUTION 

The final DEP loss model contained 23 interval scaled 
variables, five categorical (class) variables represented by 
101 dummy variables, one interaction term with 12 levels, and 
the constant term. The total number of coefficients estimated, 
components of the B vector, was 136. Table VIII and Appendix 
A, Tables XVII through XX contain the variables in the final 
model, their estimated coefficients, B, and their significance 
levels based on the Wald statistic. A 25% sample (170,685 
cases) was used for estimating the final model's coefficients. 
Estimation of B with this sample size required the maximum 
available scratch workspace and almost 20 hours of CPU time on 


a Amdahl 5990-500 mainframe computer. 


a7 


Table VIII RESULTS OF FINAL MODEL 


VARIABLE ESTIMATED COEFFIC See CIENT SIGNI ITFICAN CE 


TIMEDEP |. 0000 Sa 














|, sex | Se as ee ee 
[pata |S 5s ee 
a ne nn ae 
ee ee a 
a a 
epee 
Fos DePEw2 es | ao | 
eres | ore 





a ee 
ee ee | 
Li TERN | | SE 0226) | 2 
Je EDUC | NOTE ti | ORO 
a a a nn re a | 
be ees 


RACE by EDUC NOTE 1 .000 


CONSTANT ee 


NOTES: 1. The estimated coefficients for these class variables were not presented in 
this table due to their large number of levels. They are located in Appendix A, Tables XVII 
through XX. 
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VI. ASSESSING MODEL FIT 

A known problem with the use of logistic regression models 
is the difficulty in assessing the fit of the computed model. 
Concerning logistic regression, Dr. Steven Fienberg, says, 
"But as long as some of the predictors are not categorical, we 
cannot carry out an omnibus goodness-of-fit test for a model." 
(Ref. 10:p. 104]. Our fitted model contains 23 interval (non- 
categorical) variables. Even though we acknowledge this stated 
difficulty, we attempted to use several known methods to 
access the fit of our model. We pursued this effort in the 
hopes of gaining insight into our model's strengths and 
weaknesses. 
A. LOG-LIKELIHOOD 

The SPSS software uses the log-likelihood method to assess 
the quality of fit of the logistic regression model. With this 
method, one determines the likelihood of the observed results 
as a function of the parameter estimates. Since this 
likelihood is a small value, between zero and one, -2 times 
the log of the likelihood is used (-2LL). Additionally, the 
reason -2LL is used is that it is asymptotically Chi-Square 
distributed. A good model results in a high likelihood or, 
equivalently, a small value for -2LL. [Ref. 9:p. 52] 

Under the null hypothesis that our theoretical model fits 


perfectly, the value -2LL is from a Chi-Square distribution 
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with N - p = 170,548 degrees of freedom. Here, N is the number 
of cases in our 25% sample (170,685) and p is the number of 
parameters estimated (137). The log-likelihood assessment 


output from SPSS is depicted in Table IX. 


Table IX MODEL LOG-LIKELIHOOD FROM SPSS 


SIGNIFICANCE 


33,421.7 | Vie ene 
MODEL 35;7512.5 137 -0000 
CHI - SQUARE | 
















The extremely small significance level for - 2LL indicates 
our model is not a perfect model. The probability that such 
results would be obtained with the correct model is nearly 
zero. The model Chi-Square is used to test the null hypothesis 
that the coefficients of all the variables in the model are 
zero. The small significance level computed for the model Chi- 
Square indicates that not all of these coefficients are zero. 
As noted in the T-tests of Chapter IV, we acknowledge that 
since the sample size is so large, the null hypothesis that 
the coefficients are zero will almost always be rejected. 
Though the null hypothesis of perfect fit of the model was 
rejected, the null hypothesis that the coefficients are all 


zero was also rejected. 
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B. PEARSON CHI-SQUARE 

Hosmer and Lemeshow [Ref. 8:p. 140-145] developed a method 
for assessing the fit of logistic regression models using a 
test statistic similar to the Pearson Chi-Square test 
statistic. The strategy entails grouping the cases by their 
estimated probabilities, nm. Due to our large sample size, we 
used 20 groups with approximately 8,543 cases per group. The 
first group contained the 8,543 smallest nm values, the second 
group the next largest 8,543 values, and so on. 

For the y=1 row, representing all contracts that resulted 
in DEP loss, the expected number of DEP loss contracts for 
each of the 20 groups was obtained by summing the estimated 
probabilities of DEP loss, n for all the members of each of 
the corresponding 20 groups. The observed values for each of 
the 20 groups in this row are the number of observed DEP loss 
contracts within the respective group (y,=1). 

With the y=0 row, representing all contracts that resulted 
in accession, the expected number of accessions for each of 
the 20 groups was obtained by summing one minus the estimated 
probability of DEP loss, n for all the members of each of the 
corresponding 20 groups. The observed values for each of the 
20 groups in this row are the number of observed contracts 
that resulted in accession within the respective group (y,=0). 


Table X displays the results of these calculations. 
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Table X HOSMER-LEMESHOW GOODNESS OF FIT TABLE 













OBSERVED ae 
sees [| 58 


| 235 {205 
[—exrccren [17-5 | 66.7 | 12.1 | 1e5 | 219-7 | 
Test stat_| 7463 _| 1204 | _@3 | 23.1 | 103 | 


Pmossenven | stes | sico | eset] esos | aszs 
—ewrecres —|_e5e7_| arr ec2t_| e370 | 323 
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The Hosmer-Lemeshow goodness of fit statistic C is defined 


as follows: 


2°. (OBSERVED, - EXPECTED,)? 


C= SS SSR a ee ee 
Paa EXPECTED, (1 — %y) 
Ny 
WHERE, %, = — %, ¢ 491,...,20 3 J-1,...,N, 
i 331 


WITH N, = NUMBER OF CASES IN GROUP 1 


Hosmer and Lemeshow demonstrated that if the fitted logistic 
regression model is the correct model then C has an 
approximate y° distribution with 20 - 2 = 18 degrees of 
freedom. The critical value, x° 4;-13)(@ = .05) is 28.87. The 
group's contributions to the test statistic C are displayed in 
Table X. These sum to a number much greater than 28.87. This 
indicates our model has significant lack of fit. An advantage 
of a summary test statistic like C is that it provides insight 
into the models fit over the 20 levels of DEP loss risk [Ref. 
8:p. 144]. This model appears to fit reasonably well for those 
individuals that access (y, = 0) in all groups except the 
bottom 10% ( first two groups) and the top 5% constituting the 
twentieth group. Though the model in its entirety does not 
fit well as measured by C, there appears to be potential for 
using its relatively good fit in all of the groups, except for 
these extreme groups, for predictive purposes. 

Figure 6 illustrates how this misfit in the first two, and 


the last group impacted the value of C leading to rejection of 
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the hypothesis of model fit. With a perfect model, the 20 
group means of the estimated probabilities of DEP loss, 17 
would equal the corresponding relative frequencies of the 
numbers of observed values of DEP loss (y, = 1), to within 
random error. This would be represented by the line y = x. The 
curve corresponding to the fitted model appears to differ from 


the line y=x only for the extreme groups. 


GOODNESS OF FIT 
GROUP MEANS vs RELATIVE FREQUENCY 


RELATIVE FREQUENCY 
9 : 


02 08 O45) 05506 ic 7am 
GROUP MEANS OF ESTIMATED PROBABILITIES 7 


—— MODEL DATA —— PERFECT MODEL 





Figure 6 Hosmer-Lemeshow Goodness of Fit Plot 
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, PREDICTION PLOT 

An intuitive, alternative method for assessing the fit of 
the developed model is the prediction plot. Figure 7 shows 
smoothed histograms of the estimated probability of DEP loss, 
ma, for both the accession and DEP loss groups. The curve for 
the accession group is the plot of residuals; that for the DEP 
loss group is a plot of one minus the residuals. Relative 
frequencies were plotted due to the large quantity of 
accession cases in comparison to the number of DEP losses. 

The developed model's lack of fit is evident in the rise 
of the DEP loss curve to the left of m = .4 and the low values 
of the same curve on the extreme right. The large area under 
the DEP loss curve in the region of .6 < m < .9 appeared to 
indicate that the model fit well for conditions giving 
estimated DEP loss probabilities in this region. However, the 
curve for accessions Aanioateacenemodal accurately classified 
those that accessed. As desired, the majority of those that 
accessed were assigned a probability of DEP loss, tm, near 
zero. 

Though two different statistical tests indicate that the 
entire model was significantly different from a perfect model, 
closer examination reveals that the model we developed appears 
to perform satisfactorily for the accession and DEP loss cases 
in most conditions. In the next chapter, we examine the 
model's effectiveness in a context of its intended use for DEP 


loss prediction. 
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Figure 7 Prediction Plot for Accession and DEP Loss 
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VII. MODEL USAGE 


A. RED, AMBER, GREEN 
1. Classification Criteria 

As mentioned in Chapter I, USAREC currently uses a 
red, amber, green coding scheme for recruiters to classify 
their DEP members according to perceived DEP loss risk. This 
model could provide a similar classification, augmenting the 
recruiters first hand knowledge of DEP members. This could 
prove especially helpful in classifying newly contracted DEP 
members, before the recruiter develops a relationship with the 
DEP member. 

By computing and adjusting two threshold values of 17, 
we can control which of these three groups a DEP member is 
assigned. In determining these threshold values of nm, we used 
the following criteria. No more than one half of the DEP 
members would be placed in the amber group. This group is made 
up of the DEP members that the threshold rule will not 
classify as a predicted DEP loss or accession. The utility of 
the rule would be in question if it placed an unusually large 
number of DEP members in this group. USAREC could easily 
change this restriction on the proportion classified amber by 


adjusting the threshold values. The second criterion was to 


47 


maximize the model's accuracy in the classification of DEP 
members into the red and green categories. 
2. GREEN Classification 

The classification of a DEP member as green by the 
threshold value would alert the recruiter that this individual 
is not predicted to be a DEP loss. Figure 8 illustrates the 
power of the model with respect to the green category. We 
determined the predictive power of the fitted model is best 
represented by its accuracy of prediction. The predictive 
power for the green category increased significantly as the 
percentage of the total population classified green declined. 
Since approximately 88% of the model population accessed, an 
accuracy of 88% would have been achieved if all DEP members 
were classified as green. The power curve begins to flatten 
out as it approaches 50% classification green and rises no 
higher than 96.8% accurate at about 45% classification green. 
We decided to use the slightly smaller accuracy of 96.7% due 
to the significantly larger classification rate of 53.6% 
green. 

As indicated in Figure 8, the cutoff threshold to 
maximize green classification accuracy was determined to be 
m(x) <s .06. A high accuracy is desired in the green 
classification because a misclassification might result in a 


DEP member not receiving needed extra recruiter attention. 
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GREEN CATEGORY 


AGCURACY vs CLASSIFICATION SIZE 


PERCENT ACCURATELY CLASSIFIED 
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Figure 8 Model Power Green Classification 


3. RED Classification 
The classification of a DEP member as red by the 


threshold value would alert the recruiter that this individual 
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is a high DEP loss risk and predicted to be a DEP loss. 
Similar to the green classification's plotted power, Figure 9 
illustrates the predictive power of the fitted model with 
respect to classification into the red category. As in the 
power of the green classification, the accuracy significantly 
improved as the percent of the population classified red 
decreased. The accuracy peaked at 89.6% with a classification 
of about 4% of the population as red. 

Though this accuracy is not as high as that of the 
green classification, this still appears to be a strong 
prediction accuracy due to the small percentage (12%) of the 
population that eventually became a DEP loss. For comparative 
purposes, the accuracy would have been only about 12% if 100% 
of the population was classified red. Additionally, an error 
in this prediction may only result in a recruiter paying 
closer attention to a DEP member who may have accessed without 
the attention. As indicated in Figure 9, the cutoff threshold 
used to maximize the accuracy of those classified red was 7 (x) 
2. 70. 

4. Final Results 

As a result of the selection of these thresholds, the 
final model classified the data used to fit the model as 
depicted in Table XI. This table indicates that less than 50% 
(42.45%) of the population was classified as amber. As 


previously mentioned, the classification accuracy was strong 
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RED CATEGORY 


ACCURACY vs CLASSIFICATION SIZE 





PERCENT ACCURATELY CLASSIFIED 
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Figure 9 Model Power Red Classification 


even when constrained by no more than 50% being classified as 
amber. The over-all classification accuracy of the threshold 
rule for those DEP members that eventually did access was 


99.2%; it was 66.6% for those that were DEP losses. 
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Table XI MODEL DATA CLASSIFICATION TABLE 


GROUP / CRITERIA / 
(PERCENT OF SOPUEATION: 
OBSERVED GREEN AMBER PERCENT! 
m.(X) .06 .06 ( «- 500 ( .7 ila BY 
( 53.6 % ) ( 42.45 % ) 


aie 


NOTE: 1. The calculation for correct by Y; does not include those classified as amber. 















0 88,544 62,141 
Bek ccign 
oan p 10,606 404 
DEP LOSS 












PERCENT 
CORRECT BY 
GROUP 





B. VALIDATION 

The final test conducted was the validation of the fitted 
model on a new data set. The method of maximum likelihood 
ensured that the coefficients in B were estimated so as to 
make the observed cases in the model data set as likely as 
possible. Hence, it was expected that the fitted model would 
perform in an optimistic manner on the model data set. 
Regression models with many explanatory variables at times 
become overly reliant on the data used to fit the model by 
selecting as significant, covariate patterns unique to the 
model data set. [Ref. 8:p. 171] 

The original data set that was used to fit the model was 
a random 25% sample (170,685 cases) from the database of all 
enlistment contracts signed in FY 86 through FY 90. The new 


Gata set used for validation of the fitted model was a new 
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random 25% sample (171,809 cases) from the same database. 
Validation was conducted by calculating the logit, g(X) using 
the estimated coefficients from the fitted model, B in a 
linear combination with the covariates from the new 25% sample 
in equation (3). These values were then substituted into the 
logit transformation, equation (2), resulting in corresponding 
estimated probabilities, nt. 

Figures 10 and 11 illustrate the predictive power of the 
model on a new data set aS compared to the model data set. The 
power of the green classification on the validation data set 
was almost as strong as for the model data set. The maximum 
accuracy is obtained at the same n threshold with less than a 
.1% decrease in accuracy. 

Likewise, the model performed well with the validation 
data set in red classification. As Figure 11 illustrates, the 
predictive power of the model on the validation data set was 
almost identical to that for the model data set. The 
validation data set resulted in higher prediction accuracies 
than the model data set when lower percentages of the 
validation data set were classified red. 

The results of the validation effort indicate that the 
model is not overly reliant on the model data in either green 
or red classifications. Table XII summarizes the final 
classification results for the validation data set. Only a 
slightly larger percentage of individuals were classified as 


amber using the validation data set, still less than the 
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GREEN CATEGORY 


ACCURACY vs CLASSIFICATION SIZE 


PERCENT ACCURATELY CLASSIFIED 
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Figure 10 Green Validation Power 





criterion of 50%. The red and green classification accuracies 
for the validation data set are only marginally smaller than 
the model data set. These results indicate that our model has 
excellent potential for predicting DEP loss outcomes for 


future DEP members. 
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Figure 11 Red Validation Power 
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Table XII VALIDATION DATA CLASSIFICATION TABLE 


GROUP / CRITERIA / 
(PERCENT OF POPULATION) 


GREEN AMBER RED PERCENT! 
a(x) s .06 -06 ( x-(x) ( .7 -7 < "5(x) CORRECT BY 
('52.9%) ¢ 43.02 % ) ( 3.89'% ) Y. 


— 87,795 698 
ACCESSION 
DEP LOSS 
PERCENT 89.57 % 
CORRECT BY 
GROUP 


NOTE: 1. The calculation for correct by Y; does not include those classified as amber. 
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VIII. RECOMMENDATION AND CONCLUSIONS 


A. RECOMMENDATIONS 

Modeling human behavior is a difficult process because 
there are so many unknown and unmeasurable factors which 
ultimately affect the dependent variable being modeled. 
Modeling of the DEP loss process is no exception. Therefore, 
recommendations that follow focus on obtaining data that could 
possibly act as significant explanatory variables in a refined 
DEP loss model. 

The RENO variable used in this study indicated whether the 
enlistment contract had been renegotiated while the recruit 
was in the DEP. Though obtainable through indirect means, the 
USAREC Minimaster database does not describe the renegotiation 
process beyond a binary (yes,no) variable. Whether the 
renegotiation was a date change, training change, or job 
change might be significant information. 

National and local advertising have long been considered 
key recruiting tools by USAREC. Analysts at USAREC have been 
asked in the past to quantitatively demonstrate the 
relationships between advertising expenditures and successful 
recruiting operations. The NATADV and BDEADV variables used in 
this fitted model were aggregated to the fiscal year. These 


advertising variables were not for a specific media type such 


> 


as television, radio, or newspaper. More detailed, historical 
advertising information down to the Recruiting Battalion level 
by time and media type could be valuable in developing a 
refined DEP loss model. 

USAREC uses promotion incentives such as the E-2 referral 
program. DEP members who refer candidates which later sign a 
contract are rewarded with an advanced promotion to E-2 upon 
entering active duty. This has proven to be a valuable 
recruiting tool with respect to generating contract leads. The 
effect that this program may have on the DEP loss process was 
not modeled here due to inaccessibility of the data. Inclusion 
of this information in the USAREC Minimaster database could 
Significantly assist in development of an improved DEP loss 


model. 


B. CONCLUSIONS 

This modeling effort has attempted to quantify the complex 
DEP loss process involving many known explanatory variables. 
Though the model in its entirety did not fit well as measured 
by two statistical tests, for certain levels of estimated 
probability of DEP loss, nm, the model appeared to fit well. 
An important test of any model that might be used for 
predictive purposes is its validation. We demonstrated that 
our model performed satisfactorily on a validation data set 
obtained by taking a new 25% random sample from the database. 


With as an important of a resource management tool as the DEP, 
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a modeling effort that displays some success in predicting DEP 
loss should be pursued. We conclude that this model could 
prove useful in assisting recruiters in assessing DEP loss 


bileks Of Individual recruits. 
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APPENDIX. A 


Table XIII CAREER MANAGEMENT FIELD DEP LOSS ANALYSIS 
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Table XIV CAREER MANAGEMENT FIELD DEP LOSS (CONTINUED) 


CLASS VARIABLE ~ | peRcenT ' | PERCENT™ | 
VARIABLE | DESCRIPTION ACCESSION | DEP LOSS | 


for fener wancenenr Fe 
p81 8% CHF 93 eee | 
3.6% CMF 88 
eo | _7.3% che 91 a aay 


EE oe => 
TCT akicur 96 I 
Ee a ee yd en 


TOTAL TOTAL CONTRACT PERCENTAGES cS Ue a? -o._| 





NOTE: 1. Cell difference significance less than .00005 Chi-square test 2. This is not 
real CMF but only a surrogate "holding" CMF for a known DEP loss who is not being carried on 
record as a DEP loss. Discussed in Chapter IV. 
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Table XV RECRUITING BATTALION DEP LOSS ANALYSIS 
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Table XVI RECRUITING BATTALION DEP LOSS (CONTINUED) 


VARIABLE DESCRIPTION | PERCENT ' | PERCENT! | 
ACCESSION | DEP LOSS 
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Table XVIII ESTIMATED COEFFICIENTS FOR CMF 
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Table XIX ESTIMATED COEFFICIENTS FOR BN 
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Table XX ESTIMATED COEFFICIENTS FOR BN (CONTINUED) 


VARIABLE ESTIMATED COEFFICIENT SIGNIFICANCE 
B. LEVEL 


BN RECRUITING BATTALION 
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